Panorama Optimizations by aknayar · Pull Request #5041 · facebookresearch/faiss

aknayar · 2026-04-03T22:43:07Z

Note: Should be merged before #4970 (IVFPQPanorama).

Changes

Performance

This PR implements various optimizations to Panorama (L2Flat and IVFFlat).

Disaggregate distance computation from pruning decisions to avoid branches in distance computation hotpath.
Early batch processing termination when no points are remaining.
Manually unrolled distance kernel.
Template distance computation on level width for autovectorization.
if constexpr (C::is_max) instead of C::cmp for autovectorized pruning.
Byteset for vectorized compacting of active indices using _pext_u64.
Template distance computation and pruning on first level (no active_indices indirection) to let it autovectorize.
Hoist buffer allocations into IndexFlat/IVFFlatScannerPanorama.
Expose batch_size as a parameter for IVFFlatPanorama (for consistency with IndexFlatPanorama but also because 1024 batch_size can improve performance).

Other

Define kDefaultBatchSize once in Panorama.h (previously defined in 5 separate locations).
Allow bench_flat_l2_panorama.py and bench_ivf_flat_panorama.py to accept gist1M or sift1M as dataset to bench on.

Results

Together, these optimizations enable powerful additional speedups, especially on lower-dimensional datasets like SIFT (128d), by dramatically minimizing Panorama's overhead:

GIST1M (IVF128, nlist=128, nlevels=16)

nprobe	Recall@10	Old Speedup	New Speedup	Additional Speedup
1	0.1439	3.92x	3.93x	1.00x
2	0.2605	4.71x	5.19x	1.10x
4	0.4369	5.53x	6.75x	1.22x
8	0.6470	6.37x	8.21x	1.29x
16	0.8780	7.30x	9.74x	1.33x
32	0.9764	8.33x	11.29x	1.36x
64	0.9868	9.38x	12.74x	1.36x

SIFT1M (IVF128, nlist=128, nlevels=8)

nprobe	Recall@10	Old Speedup	New Speedup	Additional Speedup
1	0.2678	1.20x	1.81x	1.52x
2	0.4584	1.38x	2.23x	1.62x
4	0.6855	1.59x	2.70x	1.70x
8	0.8760	1.83x	3.44x	1.88x
16	0.9679	2.11x	4.72x	2.24x
32	0.9855	2.44x	5.61x	2.30x
64	0.9861	2.74x	6.39x	2.33x

Raw Data

Collected by running the new benches on main and this branch. On main, you cannot specify batch_size so remove the {1024} from the factory string in the new benches to run them there. The results above are calculated from the following raw data as follows:

For each experiment (i.e., GIST (old) or SIFT (new), calculate the Panorama speedups for each nprobe ((original ms per query) / (pano ms per query))
For each pairing of (old) and (new) results, calculate the additional speedup by calculating (new speedup) / (old speedup).

Before (`main`)

GIST1M:

======IVF128,Flat
	nprobe   1, Recall@10: 0.145200, speed: 2.705442 ms/query, dims scanned: 100.00%
	nprobe   2, Recall@10: 0.260800, speed: 5.456891 ms/query, dims scanned: 100.00%
	nprobe   4, Recall@10: 0.441900, speed: 10.895120 ms/query, dims scanned: 100.00%
	nprobe   8, Recall@10: 0.648200, speed: 21.676788 ms/query, dims scanned: 100.00%
	nprobe  16, Recall@10: 0.878000, speed: 43.142261 ms/query, dims scanned: 100.00%
	nprobe  32, Recall@10: 0.975400, speed: 84.498397 ms/query, dims scanned: 100.00%
	nprobe  64, Recall@10: 0.986800, speed: 160.092644 ms/query, dims scanned: 100.00%
======PCA960,IVF128,FlatPanorama16
	nprobe   1, Recall@10: 0.143900, speed: 0.689507 ms/query, dims scanned: 12.96%
	nprobe   2, Recall@10: 0.260500, speed: 1.158416 ms/query, dims scanned: 11.18%
	nprobe   4, Recall@10: 0.436900, speed: 1.968814 ms/query, dims scanned: 9.90%
	nprobe   8, Recall@10: 0.647000, speed: 3.401469 ms/query, dims scanned: 8.91%
	nprobe  16, Recall@10: 0.878000, speed: 5.912757 ms/query, dims scanned: 8.10%
	nprobe  32, Recall@10: 0.976400, speed: 10.147847 ms/query, dims scanned: 7.44%
	nprobe  64, Recall@10: 0.986800, speed: 17.074573 ms/query, dims scanned: 6.93%

SIFT1M:

======IVF128,Flat
	nprobe   1, Recall@10: 0.267480, speed: 0.285990 ms/query, dims scanned: 100.00%
	nprobe   2, Recall@10: 0.457520, speed: 0.564067 ms/query, dims scanned: 100.00%
	nprobe   4, Recall@10: 0.685320, speed: 1.111833 ms/query, dims scanned: 100.00%
	nprobe   8, Recall@10: 0.877210, speed: 2.195088 ms/query, dims scanned: 100.00%
	nprobe  16, Recall@10: 0.967730, speed: 4.338444 ms/query, dims scanned: 100.00%
	nprobe  32, Recall@10: 0.985400, speed: 8.500538 ms/query, dims scanned: 100.00%
	nprobe  64, Recall@10: 0.986100, speed: 16.349893 ms/query, dims scanned: 100.00%
======PCA128,IVF128,FlatPanorama8
	nprobe   1, Recall@10: 0.267670, speed: 0.239243 ms/query, dims scanned: 27.97%
	nprobe   2, Recall@10: 0.458320, speed: 0.408590 ms/query, dims scanned: 24.42%
	nprobe   4, Recall@10: 0.685480, speed: 0.699694 ms/query, dims scanned: 21.50%
	nprobe   8, Recall@10: 0.875930, speed: 1.197310 ms/query, dims scanned: 19.06%
	nprobe  16, Recall@10: 0.967760, speed: 2.055968 ms/query, dims scanned: 16.98%
	nprobe  32, Recall@10: 0.985370, speed: 3.481555 ms/query, dims scanned: 15.26%
	nprobe  64, Recall@10: 0.985980, speed: 5.977346 ms/query, dims scanned: 14.02%

After (`optimize-pano`)

GIST1M:

======IVF128,Flat
	nprobe   1, Recall@10: 0.145200, speed: 2.625779 ms/query, dims scanned: 100.00%
	nprobe   2, Recall@10: 0.260800, speed: 5.285007 ms/query, dims scanned: 100.00%
	nprobe   4, Recall@10: 0.441900, speed: 10.555867 ms/query, dims scanned: 100.00%
	nprobe   8, Recall@10: 0.648200, speed: 21.012494 ms/query, dims scanned: 100.00%
	nprobe  16, Recall@10: 0.878000, speed: 41.794143 ms/query, dims scanned: 100.00%
	nprobe  32, Recall@10: 0.975400, speed: 81.865038 ms/query, dims scanned: 100.00%
	nprobe  64, Recall@10: 0.986800, speed: 155.067333 ms/query, dims scanned: 100.00%
======PCA960,IVF128,FlatPanorama16_1024
	nprobe   1, Recall@10: 0.143900, speed: 0.668800 ms/query, dims scanned: 20.33%
	nprobe   2, Recall@10: 0.260500, speed: 1.018440 ms/query, dims scanned: 14.81%
	nprobe   4, Recall@10: 0.436900, speed: 1.563622 ms/query, dims scanned: 11.72%
	nprobe   8, Recall@10: 0.647000, speed: 2.557981 ms/query, dims scanned: 9.82%
	nprobe  16, Recall@10: 0.878000, speed: 4.292616 ms/query, dims scanned: 8.56%
	nprobe  32, Recall@10: 0.976400, speed: 7.248832 ms/query, dims scanned: 7.68%
	nprobe  64, Recall@10: 0.986800, speed: 12.171319 ms/query, dims scanned: 7.06%

SIFT1M:

======IVF128,Flat
        nprobe   1, Recall@10: 0.267480, speed: 0.295904 ms/query, dims scanned: 100.00%
        nprobe   2, Recall@10: 0.457520, speed: 0.583204 ms/query, dims scanned: 100.00%
        nprobe   4, Recall@10: 0.685320, speed: 1.150055 ms/query, dims scanned: 100.00%
        nprobe   8, Recall@10: 0.877210, speed: 2.425575 ms/query, dims scanned: 100.00%
        nprobe  16, Recall@10: 0.967730, speed: 5.509365 ms/query, dims scanned: 100.00%
        nprobe  32, Recall@10: 0.985400, speed: 10.794491 ms/query, dims scanned: 100.00%
        nprobe  64, Recall@10: 0.986100, speed: 20.727924 ms/query, dims scanned: 100.00%
======PCA128,IVF128,FlatPanorama8_1024
        nprobe   1, Recall@10: 0.267750, speed: 0.163266 ms/query, dims scanned: 34.97%
        nprobe   2, Recall@10: 0.458370, speed: 0.261109 ms/query, dims scanned: 27.97%
        nprobe   4, Recall@10: 0.685540, speed: 0.425977 ms/query, dims scanned: 23.30%
        nprobe   8, Recall@10: 0.875990, speed: 0.704580 ms/query, dims scanned: 19.98%
        nprobe  16, Recall@10: 0.967860, speed: 1.167465 ms/query, dims scanned: 17.45%
        nprobe  32, Recall@10: 0.985470, speed: 1.925296 ms/query, dims scanned: 15.50%
        nprobe  64, Recall@10: 0.986080, speed: 3.245793 ms/query, dims scanned: 14.14%

aknayar · 2026-04-04T06:03:42Z

faiss/impl/Panorama.h

+        }
+
+        float lower_bound = exact_distances[idx] - cauchy_schwarz_bound;
+        if constexpr (C::is_max) {


Unfortunately C::cmp() kills autovectorization here so we resort to this workaround.

aknayar · 2026-04-04T19:14:05Z

faiss/impl/index_write.cpp

-        write_ivf_header(ivfp, f);
-        WRITE1(ivfp->n_levels);
-        WRITE1(ivfp->batch_size);
+        if (ivfp->batch_size == Panorama::kDefaultBatchSize) {


For backward compatibility.

aknayar · 2026-04-04T19:51:09Z

faiss/impl/Panorama.h

 * accelerating the refinement stage.
 */
 struct Panorama {
+    static constexpr size_t kDefaultBatchSize = 128;


I'm considering defining kLegacyDefaultBatchSize = 128 and kDefaultBatchSize = 1024 to update the default and have a fallback for the old indexes which were created with 128. Is such a change in default behavior allowed (IVF128,FlatPanorama8 would then silently use 1024 batch_size instead of 128)?

aknayar · 2026-04-04T19:58:05Z

faiss/impl/Panorama.h

+}
+
+template <typename Lambda>
+inline auto with_bool(bool value, Lambda&& fn) {


I'm curious if there's a more appropriate location to define this.

aknayar · 2026-04-04T20:18:45Z

faiss/CMakeLists.txt

  # All modern CPUs support F, CD, VL, DQ, BW extensions.
  # Ref: https://en.wikipedia.org/wiki/AVX512
-  target_compile_options(faiss_avx512 PRIVATE $<$<COMPILE_LANGUAGE:CXX>:-mavx2 -mfma -mf16c -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mpopcnt>)
+  target_compile_options(faiss_avx512 PRIVATE $<$<COMPILE_LANGUAGE:CXX>:-mavx2 -mfma -mf16c -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mpopcnt ${FAISS_BMI2_FLAGS}>)


Will have to add this to avx512_spr as well once #5034 goes in.

aknayar added 3 commits April 3, 2026 21:55

Initial commit

7504ce7

Correct dims scanned calculation

e55a657

Original batch sizes

4a76172

meta-cla bot added the CLA Signed label Apr 3, 2026

aknayar marked this pull request as draft April 3, 2026 22:43

aknayar added 4 commits April 3, 2026 22:45

Format

d912c25

Remove

b120862

More widths

037fda2

Begin cleanup

08cf8e0

aknayar commented Apr 4, 2026

View reviewed changes

aknayar force-pushed the optimize-pano branch from aa399e1 to e9b483f Compare April 4, 2026 18:51

aknayar added 14 commits April 4, 2026 18:52

Format

5838e26

Fix build

1148c55

Inline function

0052fc2

Simplify

6ed356a

Define default batch size once

56fdb4d

Simpler SIMD

1ec343e

Remove comment

3dc0900

Remove comment

6134fe6

Fix windows build

2972d20

Backward compatibility

680f82a

Remove extra include

5d55264

Fix windows build (pt. 2)

e61c819

Fix windows build (pt. 2)

30ee47a

Update guard

0a4914d

aknayar force-pushed the optimize-pano branch from e9b483f to 0a4914d Compare April 4, 2026 18:52

Choose nlevels in bench

1e80276

aknayar commented Apr 4, 2026

View reviewed changes

aknayar added 2 commits April 4, 2026 19:27

Add comments

8123937

Rename

2ac6cf1

Another comment

a0b6e64

aknayar marked this pull request as ready for review April 4, 2026 19:34

Merge branch 'main' into optimize-pano

c81848a

aknayar commented Apr 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Panorama Optimizations#5041

Panorama Optimizations#5041
aknayar wants to merge 26 commits intofacebookresearch:mainfrom
aknayar:optimize-pano

aknayar commented Apr 3, 2026 •

edited

Loading

Uh oh!

aknayar Apr 4, 2026 •

edited

Loading

Uh oh!

aknayar Apr 4, 2026

Uh oh!

aknayar Apr 4, 2026

Uh oh!

aknayar Apr 4, 2026

Uh oh!

aknayar Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aknayar commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Performance

Other

Results

Raw Data

Before (main)

After (optimize-pano)

Uh oh!

aknayar Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aknayar Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

aknayar Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

aknayar Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

aknayar Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aknayar commented Apr 3, 2026 •

edited

Loading

Before (`main`)

After (`optimize-pano`)

aknayar Apr 4, 2026 •

edited

Loading